# Dataflow Configuration

Dora dataflows are defined and configured through a YAML file.
This dataflow configuration file specifies the nodes of the dataflow and their inputs and outputs.
It also allows configuring communication parameters and enabling debug options.

This article provides an introduction to the dataflow file format and its most important fields.
For a complete reference of all available fields their behavior check ouf the documentation of the
[`Descriptor`](https://docs.rs/dora-core/0.3.12/dora_core/descriptor/struct.Descriptor.html) and
[`Node`](https://docs.rs/dora-core/0.3.12/dora_core/descriptor/struct.Node.html) structs.

## Defining Nodes

The most important field in a dataflow configuration file is the `nodes` field, which lists the
nodes of the dataflow.
Each node is identified by a unique `id`:

```yaml
nodes:
    - id: foo
      path: path/to/the/executable
      # ... (see below)
    - id: bar
      path: path/to/another/executable
      # ... (see below)
```

For each node, you need to specify the `path` of the executable or script that Dora should run when starting the node.
Most of the other node fields are optional, but you typically want to specify at least some `inputs` and/or `outputs`.

### Inputs and Outputs

Nodes can send output messages that can be received by other nodes as input.
All inputs and outputs need to be specified in the dataflow configuration file.

For each node, list all output IDs that it sends under the
[`outputs`](https://docs.rs/dora-core/latest/dora_core/descriptor/struct.Node.html#structfield.outputs)
key.
Only the specified output IDs are valid to be used in output sending functions such as
[`send_output`](https://docs.rs/dora-node-api/latest/dora_node_api/struct.DoraNode.html#method.send_output).

Receiving nodes can subscribe to outputs by listing them in their
[`inputs`](https://docs.rs/dora-core/latest/dora_core/descriptor/struct.Node.html#structfield.inputs)
field.
The `inputs` field should be a key-value map of the following format:
`input_id: source_node_id/source_node_output_id`

The components are defined as follows:

-   `input_id` is the local identifier that should be used for this input.
    This will map to the `id` field of
    [`Event::Input`](https://docs.rs/dora-node-api/latest/dora_node_api/enum.Event.html#variant.Input)
    events sent to the node event loop.
-   `source_node_id` should be the `id` field of the node that sends the output that we want
    to subscribe to
-   `source_node_output_id` should be the identifier of the output that that we want
    to subscribe to

#### Input/Output Example

```yaml
nodes:
    - id: example-node
      outputs:
          - one
          - two
    - id: receiver
      inputs:
          my_input: example-node/two
```

### Fields Controlling Node Execution

Use the following fields to define how a node is executed, including command-line arguments and environment
variables.

##### `path` (required)

Specifies the path of the executable or script that Dora should run when starting the dataflow.
This can point to a normal executable (e.g. when using a compiled language such as Rust) or a Python script.

```yaml
nodes:
    - id: rust-example
      path: target/release/rust-node
    - id: python-example
      path: ./receive_data.py
```

See the
[`path` field documentation](https://docs.rs/dora-core/latest/dora_core/descriptor/struct.Node.html#structfield.path)
for details.

##### `args` and `env`

Use the
[`args`](https://docs.rs/dora-core/latest/dora_core/descriptor/struct.Node.html#structfield.args)
field to specify command-line arguments that should be passed to the executable/script specified in
`path`.
Use the
[`env`](https://docs.rs/dora-core/latest/dora_core/descriptor/struct.Node.html#structfield.env)
field for setting environment variables.

```yaml
nodes:
    - id: example
      path: example-node
      args: -v --some-flag foo
      env:
          IMAGE_WIDTH: 640
          IMAGE_HEIGHT: 480
```

### Fields Controlling Node Build

Use build fields define how a node is set up and built on `dora build`.
All build fields are optional.

##### `build`

The
[`build`](https://docs.rs/dora-core/latest/dora_core/descriptor/struct.Node.html#structfield.build)
field specifies the command that should be invoked for building the node.

```yaml
- id: build-example
  build: cargo build -p receive_data --release
  path: target/release/receive_data
- id: multi-line-example
  build: |
      pip install flash-attn
      pip install -e ../../node-hub/dora-phi4
  path: dora-phi4
```

**Special treatment of `pip`:** Build lines that start with `pip` or `pip3` are treated in a
special way:
If the `--uv` argument is passed to the `dora build` command, all `pip`/`pip3` commands are run
through the [`uv` package manager](https://docs.astral.sh/uv/).

#### `git`

The [`git`](https://docs.rs/dora-core/latest/dora_core/descriptor/struct.Node.html#structfield.git)
field allows downloading nodes from git repositories.
This can be especially useful for distributed dataflows.

When a `git` key is specified, `dora build` automatically clones the specified repository (or reuse
an existing clone).
Then it checks out the specified [`branch`](#branch), [`tag`](#tag), or [`rev`](#rev), or the
default branch if none of them are specified.
Afterwards it runs the `build` command if specified.

```yaml
nodes:
    - id: rust-node
      git: https://github.com/dora-rs/dora.git
      branch: main
      build: cargo build -p rust-dataflow-example-node
      path: target/debug/rust-dataflow-example-node
```

## Operators

Operators are an experimental, lightweight alternative to nodes.
Instead of running as a separate process, operators are linked into a runtime process.
This allows running multiple operators to share a single address space (not supported for Python currently).

Operators are defined as part of the node list, as children of a runtime node.
A runtime node is a special node that specifies no `path` field, but contains an
[`operators`](https://docs.rs/dora-core/latest/dora_core/descriptor/struct.Node.html#structfield.operators)
field instead.

## Other Dataflow Fields

See the [`Descriptor`](https://docs.rs/dora-core/latest/dora_core/descriptor/struct.Descriptor.html)
struct for a full list of supported fields.

## Full Specification

```mdx-code-block
import SchemaViewer from "@site/src/components/SchemaViewer";

<SchemaViewer />
```
